Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ACL cache and debouncing request queues for creation/deletion requests to make parallelism useful #357

Merged
merged 5 commits into from
Apr 8, 2024

Conversation

eh-steve
Copy link

@eh-steve eh-steve commented Nov 7, 2023

When creating/destroying 1000 ACLs:

On a real cluster with ~80k ACLs

Provides 89000% speedup for ACL creations (~8000s -> ~9s)

On a fresh cluster with no ACLs yet:

Provides a 780% speedup for ACL creations and 1470% speedup for ACL deletions .

Before:

time terraform apply -auto-approve -parallelism=1000 
25.98s user 18.43s system 83% cpu 53.286 total

time terraform destroy -parallelism=1000 -auto-approve  
39.98s user 19.88s system 32% cpu 3:04.59 total

After:

time terraform apply -parallelism=1000 -auto-approve
25.81s user 15.20s system 603% cpu 6.798 total

time terraform destroy -parallelism=1000 -auto-approve 
30.12s user 13.22s system 344% cpu 12.570 total

For anyone facing the same issue, I've published a release here until this PR is reviewed/merged:
https://registry.terraform.io/providers/eh-steve/kafka/latest

@eh-steve eh-steve force-pushed the acl-cache-and-batch branch 4 times, most recently from bd5a211 to 45c9c20 Compare November 8, 2023 09:59
@eh-steve
Copy link
Author

eh-steve commented Jan 2, 2024

Not sure if the CI failures are legit, as I have tests passing locally...?

@SanchosPancho
Copy link
Contributor

SanchosPancho commented Feb 15, 2024

I've test your PR and seems something wrong with it. After plan I've got an error:


│ Error: kafka: client has run out of available brokers to talk to: 5 errors occurred:
│ 	* EOF
│ 	* EOF
│ 	* EOF
│ 	* EOF
│ 	* EOF

Same error in the pipeline tests https://github.com/Mongey/terraform-provider-kafka/actions/runs/7386689599/job/20093815910?pr=357

Without this patch all works good

@eh-steve eh-steve force-pushed the acl-cache-and-batch branch 2 times, most recently from 6a84564 to be82e9b Compare February 19, 2024 11:42
@eh-steve
Copy link
Author

eh-steve commented Feb 19, 2024

@SanchosPancho I'd been running make testacc locally against the docker-compose'd brokers which was working.

When running testacc under docker, it does indeed fail, but seemingly from being unable to reach the brokers on localhost:9092. It seems like the env var for KAFKA_BOOTSTRAP_SERVERS wasn't correctly set or used in the Makefile, so the provider wasn't using the internal advertised listeners (kafka-{1..3}:9090), instead attempting to use the external localhost:9092, which wouldn't be reachable inside the testacc container without setting network: host...

I've made changes in be82e9b which should allow the testacc container to reach the brokers via the docker-compose network on 9090 (this would let the old CircleCI test approach work again).

I've just been running

docker compose -f docker-compose.yaml -f docker-compose.testacc.yaml up --abort-on-container-exit testacc

successfully on my end.

I think some of the tests are already racy anyway, but I've run this 5 times now and got 5 passes in a row. Could you try on your side?

Also sounds like some overlap with #383 which is nothing to do with this PR?

@SanchosPancho
Copy link
Contributor

SanchosPancho commented Feb 19, 2024

@eh-steve I've try it on real cluster, but I think that depends on kafka version, I've fail on old 2.3+ kafka cluster
seems provider 0.6 doesn't work correctly with old kafka instances

@eh-steve
Copy link
Author

Yeah I guess if you wanted to test this PR you could locally cherry pick the first commit onto an older master.

I've also released it in the registry under eh-steve/kafka version 0.5.5 so you could test with that. The parent master commit for that release is 993494b, which may have already introduced the problem though?

@eh-steve eh-steve force-pushed the acl-cache-and-batch branch from 4d9297a to e94ca55 Compare February 29, 2024 15:53
@akaltsikis
Copy link

Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?

BTW is there any help you need, to push forward with this PR?

@eh-steve
Copy link
Author

Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?

Yeah it should

BTW is there any help you need, to push forward with this PR?

Just waiting for an approval and a CI run

@eh-steve
Copy link
Author

eh-steve commented Apr 4, 2024

Hi @Mongey, thanks for the run, I can’t reproduce these CI failures locally and one of them seems to be related to GPG? Is master CI passing?

@Mongey
Copy link
Owner

Mongey commented Apr 4, 2024

hey @eh-steve thanks for the PR! looks great!
The build snapshots never really works for PR's due to GPG, I generally just ignore that failure.

I pulled the branch down locally and tested -- it seems to work for me.
Not sure what's going on with CI - it is quite flakey

@Mongey Mongey merged commit 40dacea into Mongey:master Apr 8, 2024
2 of 4 checks passed
@bmaximuml bmaximuml mentioned this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants